purrr Tutorial

Must Watch!



MustWatch



Overview

reference In this tutorial on purrr package in R, you will learn how to use functions from the purrr package in R to improve the quality of your code and understand the advantages of purrr functions compared to equivalent base R functions.


Is R Functional Programming Language?

Most of us don’t pay attention to such questions or features of a programming language. However, I have realized that this understanding is fundamental to write efficient and effective code, which is easy to understand and execute. Although R language is not purely a functional language, it does indeed have some technical properties which allow us to style our code in a way that is centered around solving problems using functions. To learn more about functional programming if you want to a> in regards to R, I encourage you to read Advance R book by Hadley Wickham. For now, we will continue with our tutorial covering essential functions from purrr package in R.


Installing purrr package

The purr package can be downloaded using three different methods. As it is part of tidyverse package in R. I guess the easiest of all is to download the tidyverse package. The other techniques include direct download or downloading the developer version directly from GitHub using install_github() function from devtool package in R # The easiest way - install the tidyverse install.packages("tidyverse") # Install just purrr install.packages("purrr") # Install development version directly from GitHub # install.packages("devtools") devtools::install_github("tidyverse/purrr") The purrr package is famous for apply functions as it provides a consistent set of tools for working with functions and vectors in R. So, let’s start the purrr tutorial by understanding Apply Functions in purrr package.


Eliminating for loops using map() function

Just like apply family(apply(), lapply(), tapply(), vapply(), etc) functions in base R purrr package provides a more consistent and easy to learn functions that can solve similar problems. Here we will look into the following three functions. Here the consistency is in regards to the output data type. The map() function always returns a list or lists. map() – Use if you want to apply a function to each element of the list or a vector. map2() – Use if you’re going to apply a function to a pair of elements from two different lists or vectors. pmap() – Use if you need to apply a function to a group of elements from a list of lists. The following example will help you understand each function in a better way. The goal of using functions from the purrr package instead of regular for loop is to divide the complex problem into smaller independent pieces.

Example map() function

In the below example, we will apply a UDF square function to each element of a vector. You will notice that the output here will be a list, as mentioned above. # defining a function which returns square square <- function(x){ return(x*x) } # Create a vector of number vector1 <- c(2,4,5,6) # Using map() fucntion to generate squares map(vector1, square) [[1]] [1] 4 [[2]] [1] 16 [[3]] [1] 25 [[4]] [1] 36

Example map2() function

Sometimes the calculations involve two variables or vectors or lists. In that case, you can use the map2() function. The only requirement here is that the two vectors should be of the same length, or otherwise, an error msg will be thrown stating inconsistency between the vector lengths. The snapshot of the error is as given below. Let’s say we have two vectors x and y. Here we are creating x to the power y. So first, we define a function that returns the desired output. And then use map2() function to get the expected outcome. x <- c(2, 4, 5, 6) y <- c(2, 3, 4, 5) to_Power <- function(x, y){ return(x**y) } map2(x, y, to_Power) [[1]] [1] 4 [[2]] [1] 64 [[3]] [1] 625 [[4]] [1] 7776 It is not necessary to pass a function. You can also use arithmetic operators directly, as given below. Say I want to get the sum of values for each value in x and y. map2(x, y, ~ .x + .y) # Or just map2(x, y, `+`) [[1]] [1] 4 [[2]] [1] 7 [[3]] [1] 9 [[4]] [1] 11

Example pmap() function parallel map

Using the pmap() function, you can map a function over multiple inputs simultaneously. Here each information is processed in parallel with the other. The parallel word here does not mean that it is processed in multiple cores. The example below is only for illustration purposes. The calculations mentioned may not make sense in the business terms, but that’s fine. Here we are generating a sum of mpg, hp and disp variables from mtcars dataset using pmap() function mtcars_sub <- mtcars[1:5,c("mpg", "hp", "disp")] pmap(mtcars_sub, sum) [[1]] [1] 291 [[2]] [1] 291 [[3]] [1] 223.8 [[4]] [1] 389.4 [[5]] [1] 553.7 x <- list(1, 1, 1) y <- list(10, 20, 30) z <- list(100, 200, 300) pmap(list(x, y, z), sum) # Matching arguments by position pmap(list(x, y, z), function(first, second, third) (first + third) * second) #> [[1]] #> [1] 1010 #> #> [[2]] #> [1] 4020 #> #> [[3]] #> [1] 9030 #> # Matching arguments by name l <- list(a = x, b = y, c = z) pmap(l, function(c, b, a) (a + c) * b) #> [[1]] #> [1] 1010 #> #> [[2]] #> [1] 4020 #> #> [[3]] #> [1] 9030 #> Unlike apply functions, you don’t have to worry about different types of outputs when it comes to map() functions from purrr package.


Working with lists using purrr package

It is crucial to understand how to be productive while working with purrr functions in R. As most of the functions return a list as output. The tasks related to lists can be put into five buckets as given below: Filtering lists Summarizing lists Transforming lists Reshaping Lists Join or Combine Lists We will now look at the number of functions and tasks falling within each group.


Filtering Lists

The three functions which we find of help and interest here are pluck() or chuck()– Using these functions, you can extract or select a particular element from a list by using its name or index. The only difference is that in case the element is not present in the list pluck() function consistently return NULL whereas chuck() will always through an error. Let us look at the example given below: ls1 <- list("R", "Statistics", "Blog") pluck(ls1, 2) [1] "Statistics" You will notice that if you pass index as 4, which does not exist in the list. The pluck() function will return a NULL value. ls1 <- list("R", "Statistics", "Blog") pluck(ls1, 4) [1] NULL Why don’t you go ahead and experiment with the chuck() function for better understanding and practice. keep() – A handy function, as the same suggests, using this function, we can observe only those elements in the list which pass a logical test. Here we will only keep elements that are greater than five into the list. ls2 <- list(23, 12, 14, 7, 2, 0, 24, 98) keep(ls2, function(x) x > 5) [[1]] [1] 23 [[2]] [1] 12 [[3]] [1] 14 [[4]] [1] 7 [[5]] [1] 24 [[6]] [1] 98 discard() – The function drops those values which fail to pass the logical tests. Say we want to drop NA values then you can use is.na()to discard observations which are represented NA in the list. ls3 <- list(23, NA, 14, 7, NA, NA, 24, 98) discard(ls3, is.na) [[1]] [1] 23 [[2]] [1] 14 [[3]] [1] 7 [[4]] [1] 24 [[5]] [1] 98 compact() – A simple, straightforward function that drops all the NULL values present in the list. Please do not confuse NA values with that of NULL values. These are two different types in R. ls4 <- list(23, NULL, NA, 34) compact(ls4) [[1]] [1] 23 [[2]] [1] NA [[3]] [1] 34 head_while() – An interesting function, the function kind of checks for the logical condition for each element in the list starting from the top and returns head elements until one does not pass the logical test. In the below example, we check if the element is character or not. ls5 <- list("R", "Statistics", "Blog", 2, 3, 1) head_while(ls5, is.character) [[1]] [1] "R" [[2]] [1] "Statistics" [[3]] [1] "Blog" If you are interested in tail elements, then the purrr package provides tail_while() function. With this, we end the list filtering functions. These are some of the most common functions which you will find of interest in day to day working.


Summarising Lists

There are a couple of functions which purrr provides, but in this purr tutorial, we will talk about the most widely used four functions. every() – This function returns TRUE if all the elements in a list pass a condition or test. In the below example, every() function returns FALSE as one of the elements inside the list is not a character. sm1 <- list("R", 2, "Rstatistics", "Blog") every(sm1, is.character) [1] FALSE some() – it is similar to the every() as in it checks for a condition towards all the elements inside a list but return TRUE if even one value passes the test or logic. sm2 <- list("R", 2, "Rstatistics", "Blog") some(sm1, is.character) [1] TRUE has_element() – The function returns true if the list contains the element mentioned. sm2 <- list("R", 2, "Rstatistics", "Blog") has_element(sm2, 2) [1] TRUE detect() – Returns the first element that passes the test or logical condition. Here the function will return the element itself. Below we are looking for elements that are numeric in the given list. Although we have two elements in the list, the function only returns the first one IE 2. sm3 <- list("R", 2, "Rstatistics", "Blog", 3) detect(sm3, is.numeric) [1] 2 detect_index() – Just like detect this function, also checks for the elements which pass the test and return the index of the first element from the list. sm4 <- list(2, "Rstatistics", "Blog", TRUE) detect_index(sm4, is.logical) [1] 4


Reshaping Lists

Flattening and getting transpose of a list are the two tasks that you will find your self doing pretty consistently as part of data wrangling. If you have made so far with this tutorial, you know that flattening is something you will be engaging with too often. The tasks mentioned here can be achieved using the following functions. flatten() – The function removes the level hierarchy from the list of lists. The equivalent function to this in Base R would be unlist() function. Although the two are similar, flatten() only removes the single layer of hierarchy and is stable. What this means is that you always know the output type. There are subgroup functions which, when used, ensure that you get the desired output. The sub-group functions are as mentioned below: